12 research outputs found

    Logical Reasoning over Natural Language as Knowledge Representation: A Survey

    Full text link
    Logical reasoning is central to human cognition and intelligence. It includes deductive, inductive, and abductive reasoning. Past research of logical reasoning within AI uses formal language as knowledge representation and symbolic reasoners. However, reasoning with formal language has proved challenging (e.g., brittleness and knowledge-acquisition bottleneck). This paper provides a comprehensive overview on a new paradigm of logical reasoning, which uses natural language as knowledge representation and pretrained language models as reasoners, including philosophical definition and categorization of logical reasoning, advantages of the new paradigm, benchmarks and methods, challenges of the new paradigm, possible future directions, and relation to related NLP fields. This new paradigm is promising since it not only alleviates many challenges of formal representation but also has advantages over end-to-end neural methods. This survey focus on transformer-based LLMs explicitly working on deductive, inductive, and abductive reasoning over English representation

    Finding the Pillars of Strength for Multi-Head Attention

    Full text link
    Recent studies have revealed some issues of Multi-Head Attention (MHA), e.g., redundancy and over-parameterization. Specifically, the heads of MHA were originally designed to attend to information from different representation subspaces, whereas prior studies found that some attention heads likely learn similar features and can be pruned without harming performance. Inspired by the minimum-redundancy feature selection, we assume that focusing on the most representative and distinctive features with minimum resources can mitigate the above issues and lead to more effective and efficient MHAs. In particular, we propose Grouped Head Attention, trained with a self-supervised group constraint that group attention heads, where each group focuses on an essential but distinctive feature subset. We additionally propose a Voting-to-Stay procedure to remove redundant heads, thus achieving a transformer with lighter weights. Moreover, our method achieves significant performance gains on three well-established tasks while considerably compressing parameters.Comment: In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2023

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Intelligent Virtual Assistants with LLM-based Process Automation

    Full text link
    While intelligent virtual assistants like Siri, Alexa, and Google Assistant have become ubiquitous in modern life, they still face limitations in their ability to follow multi-step instructions and accomplish complex goals articulated in natural language. However, recent breakthroughs in large language models (LLMs) show promise for overcoming existing barriers by enhancing natural language processing and reasoning capabilities. Though promising, applying LLMs to create more advanced virtual assistants still faces challenges like ensuring robust performance and handling variability in real-world user commands. This paper proposes a novel LLM-based virtual assistant that can automatically perform multi-step operations within mobile apps based on high-level user requests. The system represents an advance in assistants by providing an end-to-end solution for parsing instructions, reasoning about goals, and executing actions. LLM-based Process Automation (LLMPA) has modules for decomposing instructions, generating descriptions, detecting interface elements, predicting next actions, and error checking. Experiments demonstrate the system completing complex mobile operation tasks in Alipay based on natural language instructions. This showcases how large language models can enable automated assistants to accomplish real-world tasks. The main contributions are the novel LLMPA architecture optimized for app process automation, the methodology for applying LLMs to mobile apps, and demonstrations of multi-step task completion in a real-world environment. Notably, this work represents the first real-world deployment and extensive evaluation of a large language model-based virtual assistant in a widely used mobile application with an enormous user base numbering in the hundreds of millions

    The diploid genome sequence of an Asian individual

    Get PDF
    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics

    Attention mechanism optimization for sub-symbolic-based and neural-symbolic-based natural language processing

    No full text
    The capability for machines to transduce, understand, and reason with natural language lives at the heart of Artificial Intelligence not only because natural language is one of the main mediums for information delivery, residing in documents, daily chats, and databases of various languages, but also because it involves many key aspects of intelligence (e.g., logic, understanding, abstraction, etc.). Empowering the machine with more linguistic intelligence may benefit a wide range of real-world applications such as Machine Translation, Natural Language Understanding, Dialogue Systems, etc. At present, there are two popular streams of approaches for building intelligent Natural Language Processing (NLP) systems, i.e., sub-symbolic and neural-symbolic approaches. Sub-symbolic approaches learn implicit representations on the corpus that is unstructured, which is massive in amount but results in poor interpretability and reasoning ability of the learned models; neural-symbolic approaches integrate neural and symbolic architectures to incorporate structured symbolic data (e.g., semantic nets, knowledge graphs, etc.) as an external knowledge source, which makes the learned model more interpretable and logical, but the structured symbolic data is hard to be fully represented and it is comparatively scarce. As a result, both streams of approaches deserve studying, since they have their respective strengths and weaknesses, working complementarily in different tasks/scenarios. Meanwhile, attention-based models, such as Transformers, have achieved huge success in many NLP tasks such as Machine Translation, Language Modeling, Question Answering, etc. However, the attention itself has many issues, such as redundancy, quadratic complexity, weak inductive bias, etc. Besides, the previous applications of attention-based models in various NLP tasks are problematic, e.g., omitting the prior attention distribution, large computation complexity, weak long-term reasoning capability, etc. To this end, this thesis explores novel attention architectures for NLP tasks that are currently based mainly on sub-symbolic or neural-symbolic approaches to solve the existing issues and advance the state-of-the-art. In particular, for sub-symbolic-based tasks, we study Machine Translation, Language Modeling, Abstractive Summarization, and Spoken Language Understanding; for neural-symbolic-based tasks, we study Dialogue Commonsense Reasoning. The following lists the main contributions of this thesis: We study the redundancy and over-parameterization issues of Multi-Head Attention (MHA). We find that, in a certain range, higher compactness of attention heads (i.e., the intra-group heads become closer to each other and the inter-group ones become farther) improves the performance of MHA, which forces the MHA to focus on the most representative and distinctive features, providing guidance for future architectural designs. Accordingly, we propose a divide-and-conquer strategy that consists of Group-Constrained Training (GCT) and Voting to Stay (V2S). It mitigates the redundancy and over-parameterization issues of MHA. Our method uses fewer parameters and achieves better performance, outperforming the existing MHA redundancy/parameter reduction methods. We verify our methods on three well-established NLP tasks (i.e., Machine Translation, Language Modeling, and Abstractive Summarization). The superior results on datasets with multiple languages, domains, and data sizes demonstrate the effectiveness of our method. We ease the modality and granularity inconsistency problem when distilling knowledge from the teacher understanding model to the student ones, by refining the attention hidden states based on the attention map distribution. We propose to apply the Attention-based Significance Priors (ASP) to improve the semantic knowledge transfer from text to speech. We further propose the Anchor-based Adaptive Span Aggregation algorithm (AASA) that narrows the modal granularity gap of alignments. To the best of our knowledge, we are the first that evaluate multiple different alignment strategies beyond vanilla global and local alignments to study the feasibility of metric-based speech-text distillations. The results on three spoken language understanding benchmarks (i.e., Intent Detection, Slot Filling, and Emotion Recognition) verify our assumptions and claims. We improve the multi-source and long-term Dialogue Commonsense Reasoning (DCR) process, which is a new and difficult problem in NLP, by presenting a hierarchical attention-based decoding block. We propose the first Transformer-based KG walker that attentively reads multiscale inputs for graph decoding. Specifically, Multi-source Decoding Inputs (MDI) and Output-level Length Head (OLH) are presented to strengthen the controllability and multi-hop reasoning ability of the Hierarchical Attention-based Graph Decoder (HAGD). We further propose a two-hierarchy learning framework to train the proposed hierarchical attention-based KG walker, in order to learn both turn-level and global-level KG entities as conversation topics. This is the first attempt to learn models to make natural transitions towards the global topic in KG, where we present a distance embedding to incorporate distance information. Moreover, we propose MetaPath (MP) to concurrently exploit entity and relation information when reasoning, which is proved essential as the backbone method for KG path representation, providing a paradigm for KG reasoning. The results on the DCR dataset OpendialKG show that HiTKG achieves a significant improvement in the performance of turn-level reasoning compared with state-of-the-art baselines. Additionally, both automatic and human evaluation prove the effectiveness of the two-hierarchy learning framework for both short-term and long-term DCR.Doctor of Philosoph

    HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning

    No full text
    Human conversations are guided by short-term and long-term goals. We study how to plan short-term goal sequences as coherently as humans do and naturally direct them to an assigned long-term goal in open-domain conversations. Goal sequences are a series of knowledge graph (KG) entity-relation connections generated by KG walkers that traverse through the KG. The existing recurrent and graph attention based KG walkers either insufficiently utilize the conversation states or lack global guidance. In our work, a hierarchical model learns goal planning in a hierarchical learning framework. We present HiTKG, a hierarchical transformer-based graph walker that leverages multiscale inputs to make precise and flexible predictions on KG paths. Furthermore, we propose a two-hierarchy learning framework that employs two stages to learn both turn-level (short-term) and global-level (long-term) conversation goals. Specifically, at the first stage, HiTKG is trained in a supervised fashion to learn how to plan turn-level goal sequences; at the second stage, HiTKG tries to naturally approach the assigned global goal via reinforcement learning. In addition, we propose MetaPath as the backbone method for KG path representation to exploit the entity and relation information concurrently. We further propose Multi-source Decoding Inputs and Output-level Length Head to improve the decoding controllability. Our experiments show that HiTKG achieves a significant improvement in the performance of turn-level goal learning compared with state-of-the-art baselines. Additionally, both automatic and human evaluation prove the effectiveness of the two-hierarchy learning framework for both short-term and long-term goal planning

    Fusing Task-Oriented and Open-Domain Dialogues in Conversational Agents

    No full text
    10.1609/aaai.v36i10.21416THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE361011622-1162

    Recent advances in deep learning based dialogue systems: a systematic survey

    No full text
    Dialogue systems are a popular natural language processing (NLP) task as it is promising in real-life applications. It is also a complicated task since many NLP tasks deserving study are involved. As a result, a multitude of novel works on this task are carried out, and most of them are deep learning based due to their outstanding performance. In this survey, we mainly focus on the deep learning based dialogue systems. We comprehensively review state-of-the-art research outcomes in dialogue systems and analyze them from two angles: model type and system type. Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems. This will help researchers acquaint these models and see how they are applied in state-of-the-art frameworks, which is rather helpful when designing a new dialogue system. From the angle of system type, we discuss task-oriented and open-domain dialogue systems as two streams of research, providing insight into the hot topics related. Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research. Finally, some possible research trends are identified based on the recent research outcomes. To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present for deep learning based dialogue systems, extensively covering the popular techniques. We speculate that this work is a good starting point for academics who are new to the dialogue systems or those who want to quickly grasp up-to-date techniques in this area.Agency for Science, Technology and Research (A*STAR)This research/project is supported by A*STAR under its Industry Alignment Fund (LOA Award I1901E0046)
    corecore